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Abstract 

We perform numerical simulations of lattice QCD with two flavors of dynamical overlap 
quarks, which have exact chiral symmetry on the lattice. While this fermion discretization 
is computationally demanding, we demonstrate the feasibility to simulate reasonably large 
and fine lattices by a careful choice of the lattice action and algorithmic improvements. Our 
production runs are carried out on a 16 3 x 32 lattice at a single lattice spacing around 
0.12 fm. We explore the sea quark mass region down to m s /6, where m s is the physical 
strange quark mass, for a good control of the chiral extrapolation in future calculations 
of physical observables. We describe in detail our setup and algorithmic properties of the 
production simulations and present results for the static quark potential to fix the lattice 
scale and the locality of the overlap operator. 
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1 Introduction 

Since lattice QCD emerged as a quantitative tool to study non-perturbative aspects of the strong 
interaction, enormous efforts have been made to calculate physical observables with controlled 
systematic uncertainties by large-scale simulations on increasingly finer and larger lattices. In 
particular, recent algorithmic improvements [1-8] as well as development of computer technology 
enable us to approach the chiral regime of QCD [9-17] and to include all three light flavors of 
quarks dynamically [9,10,14-18]. 

The remaining crucial step towards the simulation of QCD is to preserve chiral symmetry on 
the lattice. Chiral perturbation theory (ChPT) based on this symmetry provides a theoretical 
guidance to the chiral extrapolation of physical observables to the quark mass in the real world. 
The explicit symmetry breaking in the conventional lattice actions for quarks distorts chiral 
behavior of the observables and introduces additional free parameters into ChPT [19-24]. This 
makes the chiral extrapolation unstable unless one simulates a sufficiently wide region of the 
quark mass and the lattice spacing. Lattice operator mixing is another serious obstacle to 
precise calculations of hadronic matrix elements, such as the kaon B parameter. We also note 
that massive Wilson-type Dirac operators are not protected from their (near-)zero modes due 
to the symmetry breaking. Studies on their spectrum [25, 26] suggest that it is safe to simulate 
fine and large lattices with this type of fermion discretization. 

The five dimensional domain-wall formulation [27-29] restores chiral symmetry in the limit 
of infinitely large size L s in the fifth dimension, while it is L s /a times more costly with respect 
to the Wilson-type fermions. The RBC and UKQCD collaborations [14, 30] have been pursuing 
large-scale simulations employing L s /a~10-20, with which the symmetry breaking is reduced 
to a level of a few MeV in terms of the additive quark mass renormalization. 

There has been no large-scale simulations with negligible symmetry breaking, which is of 
course desirable and opens new possibility to study unexplored subjects such as the calculation 
of the chiral condensate and the pion mass splitting through the difference between vector and 
axial vector current correlators (VV — AA). The JLQCD collaboration has started such realistic 
simulations of QCD employing the overlap formulation [31,32]. Its Dirac operator is 



where m is the quark mass and i?w = 75 Dw is the Hermitian Wilson-Dirac operator with a large 
negative mass —rriQ. This formulation exactl}0 satisfies the Ginsparg- Wilson (GW) relation [33] 



and, hence, has exact chiral symmetry on the lattice [34-36]. Note that, in practical simulations, 
the lattice spacing should be sufficiently small to guarantee its locality. 

1 Dirac operators satisfying the GW relation approximately have been proposed in Refs. [37-39] and have been 
employed in simulations in Refs. [40,41]. 



D(m) 





{ 75 ,L>(0)} = — D(0) 75 D(0) 



(2) 
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The overlap fermions are however computationally more demanding than the domain-wall 
fermions. Simulations with dynamical overlap quarks have been limited to small and coarse lat- 
tices [42-46] . The main difficulty arises from the discontinuity of the overlap action Eq. ([T]) when 
an eigenvalue of changes its sign. This substantially impairs the efficiency of the commonly 
used Hybrid Monte Carlo (HMC) algorithm unless the time consuming reflection/refraction pro- 
cedure [43] is implemented. In our dynamical overlap simulations, we avoid this overhead by 
suppressing zero modes of i/\y with a modification of the lattice action proposed in Ref. [47]. 
While this prevents us from sampling different topological sectors in a single simulation, the 
expectation values of physical observables in the QCD vacuum can be estimated by simulations 
in fixed topological sectors [48,49]. 

In this article, we perform the first large-scale simulations with dynamical overlap quarks 
in two-flavor QCD. The above mentioned setup to fix the topology enables us to simulate a 
lattice spacing a ~ 0.125 fm on a 2 fm box, which is comparable to those in recent studies 
with other discretizations. By implementing recent algorithmic improvements [3-5], the quark 
mass is reduced down to m s /6 to control the chiral extrapolation of physical observables. As 
an example, we present our estimate of the lattice spacing through the Sommer scale r$ [50] 
extrapolated to the chiral limit. The locality of the Dirac operator is a non-trivial issue for the 
GW fermions and is directly checked on the generated gauge ensembles. Status report of these 
production runs can be found in Refs. [16,51-54]. Simulations to study the e-regime at a slightly 
fine lattice spacing have been already presented in Refs. [55-59]. 

This paper is organized as follows. We introduce our lattice action in Sec. EJ Section [3] 
is devoted to a description of our simulation algorithm. In Sec. HI we present our choice of 
simulation parameters and discuss algorithmic aspects of our production runs in detail. We 
present results for the static quark potential and the locality of the overlap operator in Sees. 
and EJ respectively. Our conclusions are given in Sec. [7J 

2 Lattice action 

We employ the overlap quark action Eq. (pQ) which can be rewritten in terms of the massless 
Dirac operator as 



The parameter mo should be adjusted so that the overlap operator has good locality properties. 
We set mo = 1.6, which was also employed in a previous simulation in quenched QCD around 
our target lattice spacing [60]. The locality with our simulation setup is checked on generated 
gauge ensembles in SecJU 

A major problem with the unsmeared Wilson kernel i/\y is the appearance of its (near-)zero 
modes on relatively coarse lattices. This makes simulations costly and possibly spoils the locality 




(3) 



D(0) 



m (1 + 75 sgn [H w (-m )}) . 



(4) 
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of D. One way to reduce the localized (near-)zero modes is the use of improved gauge actions 
leading to smooth gauge configurations [61]. In our simulations, we employ the Iwasaki gauge 
action [62] 



Sg = 



Pico Yl ^Retr[l-P^(x)]+ci ^ w^(x )^Retv[l- R^(x)}\ , (5) 

L x,fi<i/ x,n,v ) 



where /3 = 6/<7q, and and R^ are lxl and 1x2 Wilson loops in the (//, v) plane. Their 
weights are cq = 3.648 and C\ = — 0.331. In our preparatory study in quenched QCD [54], we find 
that (near-)zero modes are remarkably reduced and the overlap operator shows better locality 
properties by switching the standard plaquette gauge action to this improved action. 

We however need to rule out the appearance of exact zero modes in order to avoid the time- 
consuming reflection/refraction step [43]. To this end, we introduce two copies of unphysical 
Wilson fermions with the large negative mass —mo [47, 63, 64] and two copies of twisted mass 
ghosts [47] leading to the following auxiliary fermionic determinant 

a + \A l det [H w (-m ) 2 ] 

det A w = - — — L — — (6) 

det[H w {-m ) 2 + fi 2 } 

The numerator suppresses the zero modes during continuous evolutions of the gauge field such as 
HMC, whereas effects of high modes of i^w are cancelled by the denominator. The twisted mass 
parameter [i is tuned to compromise between the suppression of the zero modes and reduction 
of the (3 shift due to the unphysical fermions. It should be noted that these unphysical fields 
have a mass of 0(a~ 1 ) and hence they do not change the continuum limit of the theory. Their 
effects can be simply considered as a modification of the gauge action by 8S g = — tr[ln[Aw]]- 

The auxiliary determinant det [Aw] fixes the net topological charge Q during the HMC 
update. We note however that local topological fluctuations are not suppressed in this setup: 
actually the topological susceptibility is calculable in a topological sector as demonstrated in 
Ref. [65]. The correction due to the fixed global topology can be considered as a finite size 
effect, which is suppressed by the inverse space-time volume 1/V [48,49]. In addition, the 
expectation values of physical observables in the QCD vacuum can be estimated by studying 
their Q dependence from simulations in fixed topological sectors [48,49]. The conventional setup, 
on the other hand, can sample different topological sectors through lattice dislocations, i.e. 
discontinuities of a gauge configuration. They occur by short distance statistical fluctuations 
and also by the appearance (or disappearance) of more physical topological objects, such as 
instantons, of order of lattice spacing. Both of these effects yielding the topological tunneling 
are increasingly more suppressed as the continuum limit is approached. Our setup to fix the 
topology (or modified algorithms such as in Refs. [66,67]) is an interesting alternative in future 
simulations near the continuum limit. It is also noteworthy that our setup provides a framework 
useful to study the e-regime of QCD, as demonstrated in Refs. [57-59]. 
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3 Simulation algorithm 



3.1 Multiplication of overlap operator 

A central building block in our HMC program is the multiplication of the overlap operator D(m) 
to a given quark field vector 4>. We evaluate the sign function sgn[i?^] in D(m) with the low 
mode preconditioning. Namely, we introduce a threshold Aw,th in the spectrum of Hyj and 
normalized eigenmodes ut (k = l, . . . ,N ep ) with their eigenvalues |Aw,fc| <<W,th are determined 
by the implicitly restarted Lanczos algorithm. We denote the number of the low modes thus 
extracted by N ep in the following. These modes are projected out in the multiplication of 
sgn[# w ] 



where P\ ow = X/fc=i Uk u k ls ^ ne P r °j ec ti° n operator on to the eigenspace spanned by {u^}. We 
also determine the largest eigenvalue | Aw, max I- The contribution of higher modes sgn[iijy](l — 
P\ ow )(f) is then estimated by a minmax rational approximation 



with the Zolotarev coefficients pi, qi for the range [A\v,th> Aw.max] [68,69]. The multiple inversions 
for (-ff w + <7;) _1 = 1) • • • > ^poic) can be carried out simultaneously by the multi-shift conjugate 
gradient (CG) algorithm [70,71]. We keep Aw,th and N po \ c constant while N cp varies as a result 
of fixing Aw,th- This together with small statistical fluctuation of Aw.max makes the accuracy 
and the computational cost in the evaluation of sgn[i?w] stable. 

3.2 Overlap solver 

We need to solve the linear equation 



for a given source vector b in preparations of pseudo-fermions and calculations of the Molecular 
Dynamics (MD) forces in HMC. In the early stage of our simulation, we evaluate .D(m) -1 by the 
nested four dimensional (4D) CG algorithm, which consists of multi-shift CG for (i? w + qi)~ l as 
the inner solver and CG for normal equations (CGNE) to evaluate D(m) _1 as the outer solver. 
As the outer solver proceeds, the computational cost of the inner solver can be substantially 
reduced by adjusting its stopping condition 



sgn[H w }4> = ^sgn[A w ,fc] u k {u\4>) + sgn[H w ](l - F\ ow ) 4>, 
k=l 



(7) 




(8) 



D(m) x = b 



(9) 



\(H^ + qi ) Xi -b\ 2 



his 



(10) 
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where i is the iteration count for the outer solver [72,73]. We employ the relaxed stopping 
condition outlined in Ref. [73]. This is based on the idea that, as the outer solver proceeds, 
the correction to the solution vector \xi— Xi—\\ becomes smaller and we do not have to evaluate 
D{m) with too much accuracy. Its implementation depends on the outer solver algorithm and, 
for CG(NE), the condition is loosened as 



ocyCi, Ci = Ci-i + 



(11) 



where 7*j_i = Dxi-\ — b is the residual for the outer solver at (i— l)-th iteration. It was ob- 
served on small lattices [73] that this relaxation leads to roughly a factor of 2 reduction in the 
computational cost. 

For a further improvement in the solver performance, we later switch to the five dimensional 
(5D) solver proposed in Refs. [74-76]. In the case of iVp i e = 2 for instance, we consider the 



following 5D matrix to solve Eq. (jUJ) 








/ H w -^/cfi 




o \ 




~\/02 -Hw 






M 5 (m) = 


Hw 











-Hw 


\/pl 






s/pi 


i?7 5 + p H w J 



An 


Al2 


A 2 i 


A22{m) 



where pi and q\ are the coefficients in the Zolotarev approximation Eq. 
m)/(2mo — m). The Schur decomposition 



1 

AoiA 







21 ^ll 1 



An 
S(m) 



1 A^Au 
1 



(12) 



and R = (2mo + 



(13) 



7TI \ — ^ 

m o - tt 75-D(m). 



contains the Hermitian overlap operator as the Schur compliment 

S(m) = A 22 (m) - A 2 i A u l A 12 
Its inverse x = S{m)~ l b can be evaluated by solving the 5D equation 

M 5 (m) x 5 = 6 5 , x 5 = ( ^ J , b 5 = ( ^ 



(14) 



(15) 



We observe that the convergence of this solver can be improved by a preconditioning based 
on the 5D structure M^~ 1 M§, where M5 is obtained from M5 by setting all gauge links to zero. 
Note that M5 is local, uniform in space-time, and easy to invert through its LU decomposition 
and forward/backward substitutions. This is naturally incorporated into the even-odd precon- 
ditioning [76] 

(1 " M 5 , eo M^ M 5 , oe )x 5 , e = 6' 5ie , 6' 5ie = M^ e (b 5 , e - M 5 , eo ^65,0), (16) 
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since -M" 5 ee ( 00 ) = M 5ee ( 00 ) where the subscripts "e" and "o" represent even and odd sites. It 
turns out in Seed] that this preconditioned solver is roughly a factor of 3 faster than the relaxed 
4D solver. 

The low mode preconditioning Eq. ([7]) is however not straightforward with the even-odd 
preconditioning. We switched it off in simulations with the 5D solver in this article, but it is 
implemented in our latest simulations of three-flavor QCD [15,16]. 

3.3 HMC 



In our implementation of HMC, we employ a combination of the Hasenbusch preconditioning [4,5] 
and the multiple time scale integration for MD [3], which has been shown to be very effective in 
simulations with Wilson-type fermions [77,78]. In our HMC program with the 4D overlap solver, 
which is referred to as "HMC-4D" in the following, the fermionic determinant is expressed as 



det [D( 



TO 



det [D(m') 2 ] det 



D(m) 2 
D(m') 2 



(17) 



where to' is the mass of the Hasenbusch preconditioner. Two determinants as well as det[A\y] 
from the extra- Wilson fermions are evaluated by introducing three pseudo- fermions <pi, </>2, 0w : 
namely 



det [Dl 



TO 



/\21 



det 



D(m) 2 
D{m') 2 



[d<j>\][d4>i}e- s \ 
[#t][# 2 ]e- 52 , 



51 = 4{D( m yD(m')] 0i, (18) 

5 2 = 4 D ( m ') [D(m)^D(m)^ 1 Dim'^fc, (19) 



and 



det [A w ] = J [# w ][# w ]e" 5w , 



(20) 
(21) 



where D tm {—rriQ^) = D\y(— too) + ifi'js is the Dirac operator for the twisted mass Wilson 
fermions. 

The expression of the force associated with the overlap pseudo-fermion is already available 
in Refs. [42,43,45,46]. Here, we explicitly write down only the simplest one from S\ 



dSi 



TOq — TO 



/2 ^ f dsgnjtfw] ^ + 75 rfsgnjgw] 



■0! 



dr dr 

where ipi = {D(m')^ D(m')} 1 <j>\ and the derivative of the sign function is given by 



(22) 



dsgnjgw] 
dr 



dHyj 
dr 



pole 



po+ E 



Pi 



EPi Hyj \ dH w 



1=1 "w 



,(23) 
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Hence, we need to evaluate {D(m')^ D(m')} 1 by the 4D relaxed CGNE and also have to invoke 
the multi-shift CG to calculate dsgn[Hyj]/dr. The forces F\ and F2 from the overlap pseudo- 
fermion actions Si and S2 are much more expensive to evaluate than Fw from S\y and the gauge 
force F g . 

The use of the multiple time scale integration is therefore crucial to reduce the computational 
cost of the MD evolution. We employ the following three nested loops 



Ti (At) 



2 
At 
~2~ 



9 nn f^T 

Tp,i -r 



T g (At) = T P)9 ( — ) T PiW (^) T V (At) T p , w ( ^f) , T P , g ' Xj 



(24) 
(25) 
(26) 



where Tjj(At) evolves the gauge field by the MD step size At, and Tp t x{Ar) updates the 
conjugate momentum with the MD force Fx (X = l, 2,W, g). We put T Pt w together with Tjj 
in the inner-most loop, otherwise the suppression of the (near-)zero modes of fails by a 
mismatch between the updated gauge configuration and Fy/. The integration scheme can be 
largely accelerated by an appropriate choice of positive integers and r g when the magnitude 
of the forces are well separated from each other. This point is one of the central issues in the 
next section. 

The reflection/refraction step is designed to deal with the discontinuity in the Hamiltonian 
along the MD evolution of the gauge field, and hence has to be included into Tjj in the inner-most 
loop. This step requires a significant computational cost to accurately locate at which point of 
the MD evolution the sign of an eigenvalue of Hyj changes. In our simple implementation, 
moreover, it involves two inversions of the overlap operator to evaluate the Hamiltonian just 
before and after the change of the signal This step therefore could lead to a considerable slow 
down of simulations. The determinant det[A\y] enables us to avoid this serious overhead. 

In HMC with the 5D solver, which we call "HMC-5D" in the following, we have to modify 
the implementation of HMC due to the lack of the low mode preconditioning. The coefficients 
pi and qi for the 5D solver are determined with an appropriate choice of Aw,th and N po \ c , which 
are kept fixed during our simulation. This could lead to a sizable error in sgnfffyv], when 
has eigenvalues smaller than A\y,th- I n order to keep the accuracy of sgn[i?\y] comparable to 
that in HMC-4D, the fermionic determinant is modified as 



det [D{ 



111 



det [D'( 



m') 2 ] det 



D'{m) 2 
D'(m') 2 



det 



D(m) 2 
D'(m)< 



(27) 



where D' represent the less accurate overlap operator without the low mode preconditioning. 
The first two determinants are dealt with by the usual HMC steps, whereas the last factor is 



Improved implementations of the reflection/refraction step are proposed in Refs. [79, 
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taken into account by the noisy Metropolis test [81]. The probability is evaluated as 

P = mm{l,e~ dS }, dS=\W~ 1 [U new ]W[U old }{;\ 2 -\£\ 2 , (28) 

where £ is a random Gaussian noise vector and VF[^new(oid)] is D{m)/D'{m) on the final (initial) 
gauge configuration. Therefore, this step needs to invert both of D(m) and D'(m) and spends 
a significant fraction of the total CPU time. 

Another difference from HMC-4D is that (D> D)^ 1 is evaluated by invoking CGNE twice 
for Eq. (|16|) . since no 5D representation is available for (D^D)~ l . It turns out that, however, 
CGNE is effective in inverting the preconditioned 5D matrix in Eq. (|16p and switching CGNE 
to MINRES does not lead to a substantial reduction in the computational cost. 

3.4 Machine 

Our numerical simulations are carried out on the supercomputer system at KEK. This is a com- 
bination of 16 nodes of Hitachi SR11000 and 10 racks of IBM Blue Gene/L, whose peak speeds 
are about 2.15 and 57.3 TFLOPS, respectively. Our measurement of the static quark potential 
is inexpensive and is carried out on the SR11000 computer. The configuration generation with 
the above mentioned HMC algorithm is computationally intensive and is carried out on Blue 
Gene/L. To increase the sustained speed as much as possible, we employ an assembler code 
developed by the IBM Japan for the multiplication of -Dw- This code makes the best use of 
the so-called double FPU instructions, which process complex-arithmetic operations in double 
precision effectively using two arithmetic pipelines. It also has a good scalability with respect 
to the number of computing nodes by using a low-level interface for inter-node communications. 
We find that this assembler code is roughly a factor of 3 faster than our naive Fortran code. 

4 Production Run 

4.1 Simulation parameters 

We simulate QCD with two flavors of degenerate up and down quarks employing the lattice 
action introduced in Sec. [21 The twisted mass for the auxiliary determinant det[A\v] is set to 
/i = 0.2 from our studies in quenched QCD [47,54]. Numerical simulations are carried out on a 
Ng x N t = 16 3 x 32 lattice at a single value of /? = 2.30, which is expected to correspond to our 
target lattice spacing 0.125 fm. The box size L should be around 2 fm. In the trivial topological 
sector, we simulate six sea quark masses listed in Tables [T] and [2j From our analysis of the 
meson spectrum [82], this choice covers a range from m s down to m s /6 in physical units. The 
statistics are 10,000 HMC trajectories at each quark mass with the unit trajectory length set to 
0.5. At m = 0.050 which roughly corresponds to ~ m s /2, we also accumulate 5,000 trajectories 
in the non-trivial topological sectors with Q = —2 and —4. The initial gauge configuration for 
these runs is prepared as in Ref. [83]. The generated gauge configurations are stored on disks 
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m 


Q 


Apolc 




m! 






r 9 


HMC traj. 


-Phmc 


time [min] 


0.015 





10 


0.108 


0.2 


9 


4 


5 


2800 


0.875(7) 


50 


0.025 





10 


0.108 


0.2 


8 


4 


5 


5200 


0.900(3) 


41 


0.035 





10 


0.108 


0.4 


6 


5 


6 


4600 


0.739(7) 


28 


0.050 





10 


0.108 


0.4 


6 


5 


6 


4800 


0.781(5) 


23 


0.070 





10 


0.108 


0.4 


5 


5 


6 


4500 


0.818(7) 


20 


0.100 





10 


0.108 


0.4 


5 


5 


6 


4600 


0.852(5) 


17 


0.050 


-2 


10 


0.108 


0.4 


6 


5 


6 


1100 


0.762(13) 


24 



Table 1: Parameters in the production simulations with HMC-4D. The right-most column shows 
the CPU time per 1 HMC trajectory on one rack of Blue Gene/L. 

every 10 trajectories for future measurements of physical observables. These parameters are 
summarized in Tables [T] and El 
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-^polc 


^W,th 


m! 




r 4> r 9 


HMC traj. 


-Phmc 


time [min] 
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13 
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7200 


0.686(6) 
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0.2 
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19 
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0.4 


9 


6 8 


5200 


0.879(5) 


15 
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10 


0.108 


0.4 


8 


6 8 


5500 


0.917(4) 


13 


0.100 





10 


0.108 


0.4 


7 


6 8 


5400 


0.926(3) 


11 


0.050 


-2 


10 


0.108 


0.4 


9 


6 8 


3900 


0.882(5) 


15 


0.050 


-4 


10 


0.108 


0.4 


9 


6 8 


5000 


0.872(5) 


15 



Table 2: Parameters in the production simulations with HMC-5D. The right-most column shows 
the CPU time per trajectory. 
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M 


m 


Q 


-Wpole 


■^W.th 


m! 




r 4> 


r 9 


HMC traj. 


-Phmc 


time [min] 


2.45 


0.0 


0.090 




12 


0.096 


0.4 


6 


5 


6 


300 


0.78 


46 


2.35 


0.2 


0.110 





10 


0.144 


0.4 


5 


5 


6 


1200 


0.87 


12 



Table 3: Parameters in the test runs with and without the determinant factor Eq. ([S]). The 
right-most column shows the CPU time per trajectory. 



In the course of our calibration of the lattice spacing, we investigate the impact of the 
auxiliary determinant on the computational cost at a slightly finner lattice spacing at (fl, /i) = 
(2.35,0.2). A single quark mass around m s is simulated without the determinant (namely, 
^ = 0.0) but with the reflection/refraction procedure. This run can be compared to one of our 
simulations with ju = 0.2 at a similar lattice spacing, which has been reported in Ref. [58]. We 
summarize parameters of these runs in Table [3j 
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Figure 1: Monte Carlo history of plaquette (left panels) and iVii 
1,000 trajectories with HMC-4D. 



(right panels) during first 
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Figure 2: Autocorrelation times T inti pi aq (left panel) and Ti nti i nv (right panel) as a function of 
quark mass m. Data for HMC-5D are slightly shifted in the horizontal direction for clarity. 



4.2 Autocorrelation 

We first discuss the autocorrelation in our simulations to fix the bin size used in the jackknife 
analysis in the subsequent sections. In Fig. [H we plot the time history of the plaquette and the 
number of iterations AW to invert D(m) in the simulations with HMC-4D. It is observed that 
N[ nv shows longer and larger fluctuations as m decreases, while such a tendency is not clear in 
the plaquette. 

A conventional measure of the autocorrelation of an observable O is the integrated autocor- 
relation time Ti n t 5 defined by 

1 x 

nm,o = 2+J>^ T ) (29) 

r=l 
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Figure 3: Jackknife error of combined data of plaquette as a function of bin size. Left and right 
panels show data at Q = and at m = 0.050. 
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Figure 4: Jackknife error of iV"i nv in simulations with HMC-4D. 



through the normalized autocorrelation function 



Po{r) 



ro(r) 
r o (0) 



r c (r) = <(O(r ) - (O))(O(t + r) - (O))) T0 , 



(30) 



where the Monte Carlo average (over tq) is explicitly indicated by the bracket (• • • )( T0 )- Practi- 
cally, the sum in Eq. ([29]) has to be truncated at a certain value of r = r max . In this analysis, 
we employ the condition adopted in Ref. [7]: namely, r max is set to the minimum value of r 
satisfying 



Poij) - Spo(t) < 0, 



(31) 



where 5po(r) is the standard deviation of po(j) estimated by the Madras-Sokal formula [84,85]. 
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0.015 


0.025 


0.035 


0.050 


0.070 


0.100 


plaq. 


0.614789(10) 


0.614777(9) 


0.614764(8) 


0.614718(9) 


0.614709(10) 


0.614667(9) 



Table 4: Average of plaquette from the production runs with Q = 0. 



(m sea ,Q) 


(0.050,-2) 


(0.050,-4) 


plaq. 


0.614762(13) 


0.614704(15) 



Table 5: Average of plaquette at Q^0. 



In Figl21 we plot 

7~int,piaq fc> r the plaquette and Tin^mv 

for A^nv as a function of m. It turns out 
that Ti nti piaq is not large (about 5 trajectories) and has small m dependence, probably because it 
is a local quantity. On the other hand, N- my is expected to be sensitive to the low modes of D(rn), 
and in fact Ti nti i nv increases to 0(100-200) trajectories at small quark masses m< 0.035. While 
these observations are consistent with Fig. [H it is clear that our statistics are not sufficiently 
large to estimate Ti ntj i nv accurately at small m. 

Therefore, we also check the bin size dependence of the jackknife error in Fig. [3l where two 
data of the plaquette obtained with HMC-4D and 5D are combined. Roughly speaking, the 
jackknife error becomes stable when the bin size is > 100-200 trajectories irrespective to the 
choice of m and Q. At similar bin sizes, the jackknife error of Ai nv also becomes stable as shown 
in Fig. HI From these observations, we employ the bin size of 200 trajectories throughout this 
article, unless otherwise stated. Although only a limited number of bins are available to analyze 
algorithm-dependent quantities, such as Aj nv , in simulations with HMC-4D at (m, Q) = (0.015, 0) 
and (0.050,-2), it turns out that decreasing the bin size leads to an even smaller statistical 
error for such quantities. For a reference, the plaquette averaged over our full statistics and its 
jackknife errors are summarized in Tables H] and [SJ 

4.3 Spectral density of H w and sgn[If w ] 

In Fig. we compare the low-lying spectrum {Aw} of Hyj in our test runs listed in Table 
Without the auxiliary determinant det[A\y], I Aw I nas an almost uniform density in the investi- 
gated region |Aw|^0.1. This causes the reflection (refraction) occurring roughly 130 (14) times 
per 100 trajectories at our lattice spacing which is only slightly coarser than those in recent 
simulations with Wilson- type actions. In contrast, near-zero modes are remarkably suppressed 
by the determinant. We can safely turn off the reflection/refraction without a serious loss in the 
acceptance rate and observe about a factor of 4 reduction in CPU time as in Table El 

In Fig. [6l we plot the eigenvalue distribution in our production simulations at several quark 
masses. Near-zero modes are successfully suppressed also in these high statistics runs, and 
resulting distribution has small m dependence, as it should. Figure [7] shows the MD evolution 
of the lowest eigenvalue Aw,min- We observe that Aw,min approaching to Aw = is eventually 
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Figure 5: Eigenvalues of Hyj smaller than 0.04 at each HMC trajectory and its histogram. Left 
and right panels show data from test runs at ((3, fi) = (2.35, 0.2) and (2.45,0.0) (namely with 
and without the determinant det[A\y]). We plot data during the first 300 trajectories in each 
simulation. 



0-08 




Figure 6: Histogram of low-lying engenvalue |A\v| in production runs at m = 0.015 (thick solid 
line), 0.050 (thin solid line), and 0.100 (thin dot-dashed line). The small panel shows the same 
data near Aw = 0. 



bounced back by the repulsive force from the potential barrier generated by the determinant. 

The suppression of near-zero modes enables us to take a relatively large value for the thresh- 
old A\v,th an d thus small N po \ e in the multiplication of D. We set A\y,th = 0.108 at all values 
of m based on the small m dependence of the low mode distribution in Fig. O There are only 
a few eigenvalues below this threshold, and hence it does not take too much time to determine 
such low-lying modes (A\y,fc> fw,fc) (k = l,... ,N ep ) with a strict condition 

|(fl"w-Aw,fc)«w,fc| < 10" 13 (32) 
for the low- mode preconditioning Eq. ((Jj). We set N po \ c = 10 with which the accuracy of the 
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HMC trajectory 

Figure 7: Lowest eigenvalue of Hyj during MD evolution in first 50 trajectories at /3 = 2.30 and 
m = 0.050. 



Zolotarev approximation Eq. (jHJ) is typically |sgn[if\y] 2 — 1| ~ 10 7 . 

In simulations with HMC-5D, we consider saving the CPU time by loosening the approxi- 
mation of sgn[i?w] for D'(m') and D'{m) in Eq. (j27H . since the noisy Metropolis test guarantees 
that gauge configurations are generated with the fermionic determinant of the accurate over- 
lap operator det[D(ra) 2 ]. We set (Aw.thj -^poie) = (0.024, 10) for the Hasenbusch preconditioner 
D'(m'). Since the error of the rational approximation scales as « exp[— A\y,th ^poie]> this is less 
accurate compared to D{m!) with (0.108, 10) for HMC-4D. While Fig. [6]shows that there appear 
a non-negligible number of eigenmodes below Avyth = 0.024, HMC-5D achieves the reasonable 
acceptance rate listed in Table [2j This suggests that the error due to the rough approximation in 
D n s as well as that due to the lack of the low mode preconditioning are stochastically cancelled 
(in part) between Tp t \ and Tp^ in the MD integration. We set (A\v,th, ^poic) = (0.0024, 16) for 
D'(m), since relatively large Aw,th leads to a substantially poor acceptance rate for the noisy 
Metropolis test. 



4.4 and overlap solvers 

Our HMC programs involve various stopping conditions. We need to specify conditions for the 
4D overlap solver 

\Dx-b\/\b\ < e 4D ,x (33) 
and those for the 5D solver 

| (1 - MjTgg M 5 ,eo M^o M 5 ,oe)x 5 ,e " &5,el / \ b \ < e 5D,X, (34) 

where the subscript X is "f" or u H n representing the condition for the calculation of the MD 
force or the Hamiltonian. Note that esD,x is the condition fulfilled by the 5D preconditioned 
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Figure 8: Measures of reversibility violation Afj jave (left panel) and Ah (right panel) for HMC- 
4D as a function of stopping conditions £ — £4D,f- 



solution vector x^^ e . Our numerical test suggests that £5D,x should be stricter than £4D,x by one 
or two orders of magnitude so that the accuracy of the 4D piece x in X5 is comparable to that 
by the 4D solver with £4D,x- 

The stopping condition for the multi-shift CG inside the 4D overlap solver is automatically 
determined by Eq. (jlip except for the initial residual r$ = D XQ — b. An appropriately strict value 
< 10~ 6 is employed to calculate vq with the given initial guess xq. Therefore, we only need to 
specify conditions for multiplications of D and calculations of dsgn[Hyj]/dT by Eq. (|23p 



\{H§f + m)x - b\ I \b\ < e^ x (I =, 1, . . . , iV pole ) (35) 

with X = "f" or ll H\ Here the condition in HMC-4D (5D) is represented by the subscript 
Y = "4D" ("5D"). In our production run, we employ a rather strict condition ey f ^ = ey j H = 10 -10 
for calculations of the Hamiltonian in order to carry out the Metropolis tests accurately. 

The choice of €y| and eyf is crucial to save the computational cost but should be strict 
enough to make our HMC reversible. The conventional measures of the reversibility violation 
in the gauge links U x u an d the Hamiltonian H are 



Af7,ave = |^(r + 0.5-0.5)-C/^(r)|V(36iV|iV t ), (36) 



x,/i,a,l 



A H = \H(r + 0.5 - 0.5) - H{t) |, (37) 

where a and b are color indices and note that our unit trajectory length is 0.5. We pick up ten 
gauge configurations separated by 200 trajectories, and calculate A[/ iave and Ah by updating 
them by one trajectory and then evolving back with the reversed momenta. Figure [8] shows these 
measures in HMC-4D, for which we set e^D f = £4D,f f° r simplicity. We observe a monotonous 
decrease in both measures and small m dependence of their size. By employing 

e2S,f = e4D,f = 10" 7 , (38) 
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Figure 9: Residual of overlap solvers as a function of the number of the Dyj multiplication at 
m = 0.025. Thick and thin lines show results for the 5D and 4D solvers. For the 5D solver, we 
plot the residual for the 4D piece x of the 5D solution vector x§, namely r = D(m)x — b. 



at all quark masses, the reversibility in our simulations is preserved at a level of A[/ jav0 < 10 -8 
and Ah ^ 10 -4 , which are comparable to those in previous large-scale simulations with the 
Wilson-type actions [86-88]. A similar study for HMC-5D leads us to set 



t 5D,f 



10~ 6 , e 5 D,f = 10- 7 (39) 



in order to maintain the reversibility at the same level to HMC-4D. 

In Fig. [9j we compare the convergence of the 4D and 5D solvers by plotting the normalized 
residual |r| 2 /|6| 2 as a function of the number of the Z>\y multiplication iV mu it. We take m = 0.025 
and turn off the low mode preconditioning by setting Avyth = 0-0 for a fair comparison between 
the 4D and 5D algorithms. The relaxed condition Eq. pip works well on our 2 fm box and 
achieves about a factor of 2 speed up compared to the standard CG. The 5D solver is even 
faster by about a factor of 3 mainly due to the preconditioning of Eq. (|16p . We observe an 
acceleration of similar magnitude also at other quark masses. 

It is an interesting issue how the computational cost scales as a function of m. The itera- 
tion count of our overlap solver N{ nv depends on m through eigenvalues Afc(m) of D(m). For 
simplicity, we use the following approximation for low-lying eigenvalues 

A fc (m) = m + i\ k (0) (40) 

where Afc(0) is the k-th eigenvalue of the massless Dirac operator -D(O) projected to the imaginary 



axis 



~ Im[A fe (0)] 

Afc(0) " l-Re[A fc (0)]/2' (41) 
and we ignore the small correction factor 1— m/(2mo) ~ 1 in Eq. ([3]). The m dependence of iVi nv 
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Figure 10: Quark mass dependence of CG iteration count iVi nv for 4D overlap solver. Left 
and right panels show data for the Hasenbusch preconditioner D(m') and the operator with the 
physical mass D{m). The dashed line shows the fit to Eq. (I43p . 



algorithm 


Qnv 


Qinv 


HMC-4D 


22.86(9) 


0.869(1) 


HMC-5D 


159(8) 


0.64(2) 



Table 6: Fit parameters in Eq. (|43|) for two algorithms HMC-4D and 5D. 



in HMC-4D is then expected to be 

AW oc =- (42) 

(m 2 + Ai(0) 2 ) Q / 2 V ' 

with the power a < 1 for CGNE. 

The mass parameter in Eq. (I42p should be m! for the Hasenbusch preconditioner D(m'). 
Since we take large values for m', N[ nv is governed by m! and has small m dependence as shown 
in Fig. [ToJ On the other hand, iVi nv to invert D(m) increases monotonously as m decreases. 
Data at four heaviest quark masses are reasonably described by the scaling law Eq. (|4~2|) with 
Ai neglected 

N inv = c inv m- a ^. (43) 
Fit parameters are listed in Table El We note that the power a; nv is close to its maximum value 

C^inv — 1 • 

Our data of A^; nv at m< 0.025 however clearly deviate from this fit. This can be considered 
as a manifestation of finite size effects as we approach the e-regime by decreasing m with the 
fixed spatial extent L [57,58]. Figure fTTI actually shows that the magnitude of A&(0) in units of 
m rapidly increases toward smaller m. Namely, at heavy quark masses m > 0.050, £>(m)t£>(m) 
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Figure 11: Projected eigenvalue Afc(O) as a function of m. We plot data for five lowest-lyin^ 
modes. The dashed line shows \ = m. 
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Figure 12: Quark mass dependence of CG iteration count iVi nv for 5D solver. Left and right 
panels show data for D'(m') and D'(m), respectively. 



for CGNE has dense low- lying eigenvalues |Afc(m)| 2 near m 2 and, hence, mis a good parameter 
to characterize its condition number. On the other hand, |Afc(m)| 2 becomes sparse and deviates 
from m 2 as m decreases. It is likely that this rapid change in the low mode distribution distorts 
the m dependence of A^ nv from the simple scaling Eq. 

The influence of Afe(0) to A^; nv is less clear in HMC-5D, since the matrix to be inverted is 
5D preconditioned matrix rather than D(m). We only note that, as seen in Fig. IT2l N[ nv for 
the preconditioner D'{m ! ) is mainly determined by m' and iVi nv for D'(m) shows a somewhat 
weaker power scaling with parameter listed in Table El 

4.5 Properties of HMC 

The parameters for the Hasenbusch preconditioning, Eqs. (|17|) and (|27|) . and the multiple time 
scale MD integration, Eqs. (|24f) - (|26p . are listed in Tables [T] [21 The gauge force F g is known to 
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Figure 13: Time history of MD forces F%, F\, F g and Fw at m = 0.015 (four left-most panels) 
and 0.050 (four right-most panels) with HMC-4D. Two lines in each panel show the average and 
maximum value over the degrees of freedom (color, space-time direction and coordinates). 





HMC-4D 




HMC-5D 




m 


Q 


AH 


e -AH 


AH 


e -AH 


0.015 





0.0603(72) 


0.9957(64) 


4618(4617) 


0.998(14) 


0.025 





14(14) 


0.9954(32) 


0.1142(52) 


0.9997(59) 


0.035 





0.2224(86) 


1.016(11) 


0.068(15) 


0.9964(42) 


0.050 





0.1846(84) 


1.0008(83) 


0.0519(39) 


0.9982(45) 


0.070 





0.198(99) 


0.9989(52) 


0.0260(29) 


0.9950(24) 


0.100 





0.0692(33) 


0.9963(34) 


0.0207(33) 


0.9989(30) 


0.050 


-2 


0.37(16) 


0.978(15) 


0.0432(80) 


1.0082(61) 


0.050 


-4 






0.0601(55) 


0.9934(52) 



Table 7: Average of AH and e 



be generally larger than the fermionic force(s). Only ml needs a non-trivial tuning to make a 
hierarchic structure among the MD forces F2, F\ and F g . As discussed in Refs. [5,78], m! should 
be decreased for smaller m to avoid a too large condition number for the preconditioned Dirac 
operator D(m) / D{m'). This is why m! is set to a smaller value at to<0.025 than others. While 
we have not done further fine tuning of m' , Figs. [13] and [14] show that the forces F\, F2 and 
F g are well separated from each other with our choice of mf. This enables us to use the ratios 
of the step sizes, namely and r g , around 4-8, which considerably reduce the computational 
cost of the MD integration with the acceptance rate kept in a reasonably high range > 0.7. 

The same figures show that Fy? from the extra- Wilson fermions exhibits large statistical 
fluctuations. It becomes as large as F g probably due to the appearance of small eigenvalues of 
Hyj. This is another reason why we update Fyy in the inner- most loop in Eq. (1260 . in addition 
to its consistency with the updated gauge field described in Sec. [3] 
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Figure 14: Time history of MD forces at m = 0.050 with HMC-5D. 



le+05 



50000 - 



i i i | i i i |-p i i | i i i | i i i r 




1000 2000 3000 4000 5000 

HMC traj. 

Figure 15: Time history of AH in simulation at m = 0.025 with HMC-4D. We have a spike at 
2151-th trajectory. 



We denote the change in the Hamiltonian due to the discretized MD integration by AH and 
its average is summarized in Table [71 The area-preserving property of MD leads to the following 
(in)equality 

< ( e - AH ) = l, (44) 

where we explicitly indicate the value averaged over HMC updating by the bracket (•••). The 
inequality predicts that averaged AH is positive and this is the case in our simulations. Two of 
them are however dominated by huge spikes shown in Fig. [T5l Similar spikes have been observed 
also in previous simulations with Wilson-type fermions [89], and they may be attributed to 
instability of HMC with a large MD step size [90]. The spikes in our simulations have rather 
simple origin as shown in Fig. [To! Hyj can develop a very small lowest eigenvalue leading to a 
spike in Fyj and hence in AH. This is why we take a larger value for r g in HMC-5D to be more 
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Figure 16: Maximum value of -Fw an d lowest eigenvalue of Hyj during MD evolution where a 
spike in AH is observed. 




Figure 17: Acceptance rate -Phmc as a function of AH in simulations with HMC-4D (left panel) 
and HMC-5D (right panel). The solid line is the expectation of Eq. (|45p . Data at (m,Q) = 
(0.025, 0) with HMC-4D and at (0.015, 0) with HMC-5D are consistent with the expectation 
within their huge error and hence are omitted. 



robust against the spikes with a smaller MD step size for the calculation of Thanks to the 
determinant det[A\y], however, the number of such huge spikes is not large even in HMC-4D, 
at most a few per 10,000 trajectories. As a result, the equality Eq. (|44|) is fulfilled within 2 % 
accuracy without introducing the replay trick [6,91]. We also note that AH dependence of 
-Phmc is consistent with the expected form of the complementary error function 



-Phmc = erfc 
as plotted in Fig. [T71 



y/AH/2 . (45) 
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Q 





-2 


-4 


m 


0.015 0.025 0.035 0.050 0.070 0.100 


0.050 


0.050 


HMC-4D 
HMC-5D 


11.9(1) 7.6(4) 5.0(2) 4.5(2) 3.4(1) 3.1(1) 
6.53(4) 5.10(2) 4.41(2) 3.62(1) 2.98(1) 2.44(1) 


5.08(2) 
3.61(2) 


3.65(1) 



Table 8: Number of Dvv multiplications per trajectory in units of 10 . 




HMC-5D 




0.06 



Figure 18: Number of D w multiplications iV mu it in HMC-4D (left panel) and HMC-5D (right 
panel). Open symbols are iV mu i t in calculations of MD forces and in the Metropolis tests (MTs), 
whereas the filled symbol is their total. 



4.6 Simulation cost 

On a half rack of Blue Gene/L, the assembler code for the multiplication of D\\r achieves roughly 
28 % efficiency of the peak performance when all the data are in the L3 cache. The sustained 
speed averaged over all HMC steps is about 15 % indicating significant overheads due to a limited 
bandwidth to the off-chip memory, and to linear computations with quark vectors in the low 
mode preconditioning and so on. 

In Table El we summarize the number of .D\y multiplications iV mu it per trajectory, which 
serves as a machine independent measure of the simulation cost. This is compared with iV mu it 
at each HMC step in Fig. [THJ As expected, calculations of the overlap forces F\ and F% spend 
a large part of the total CPU time especially at small quark masses m < 0.050. Note also that 
the costs to calculate the two forces are of the same order: in other words, they are reasonably 
balanced with our choice of m' and ta. While Fy? is calculated in the inner-most loop of our 
MD integration, its computational cost turns out to be negligible in HMC-4D, and is not large 
even in HMC-5D, where the overlap solver is accelerated by the 5D algorithm. 

In simulations with HMC-5D, the noisy Metropolis test also needs a substantial fraction of 
the total time. This is because it has to invoke the 4D solver with the strict stopping condition 
to calculate the probability Eq. (|28[) accurately, whereas other HMC steps are implemented with 
the much faster 5D solver. We note that this step is removed in our latest 2+1-flavor simulations 
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Figure 19: Comparison of iV mu it between HMC-4D and HMC-5D. Left panel shows -/V mu i t only 
for Tp t \ and Tp2, whereas right panel is -/V mu i t for all HMC steps. 



algorithm 




O^mult 


HMC-4D 


0.57(3) 


0.68(2) 


HMC-5D 


0.368(3) 


0.710(3) 



Table 9: Fit parameters to Eq. ([4"6"]1 for two algorithms HMC-4D and 5D. 



by incorporating the low-mode preconditioning into the 5D solver [15,16]. 

Figure [TU] shows a comparison between HMC-4D and HMC-5D in total iV mu it and that for 
Tp t % and Tpp. To take the difference in Phmc into account, JV mu it in this figure is corrected 
by a factor At /At', where At' is the step size corresponding to Phmc = 0.8 estimated by 
assuming Eq. (|45p and AH oc At 4 . Due to the rough approximation and the lack of the low- 
mode preconditioning for sgnfi^w] hi D' , At have to be decreased by roughly 50% when the 
algorithm is switched from HMC-4D to HMC-5D with Phmc kept fixed. Even with this overhead, 
we observe about factor of 2 reduction in iV mu it for Tp t \ and Tpp. The noisy Metropolis test 
reduces the net gain to roughly 50% at all the simulated quark masses. We note in passing that 
the CPU time summarized in Tables [T] and [2] shows slightly better acceleration than in iV mu it at 
m> 0.035. This is because the low- mode preconditioning leading to the overhead mentioned at 
the beginning of this subsection is switched off in the 5D solver. 

For future reference, we fit m dependence of the corrected -/V mu i t into a simple power law 

N ma it = c m ultm- a ^. (46) 

Fit parameters are summarized in Table [9l While data at small m are subject to the finite size 
effects as discussed in Sec. I4.4[ the fit parameters do not change significantly if we discard the 
data at m< 0.025 from the fit. Note that this is the cost per trajectory and the m dependence of 
the autocorrelation, which is not clear with our statistics, is not taken into account. Thanks to 
the improved algorithms, iV mu ] t has a much milder m dependence than m~ 2 , which was employed 
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Figure 20: Effective potential V e g(r) at r = 2 and 3\/3. Left and right panels show data at 
m = 0.015 and 0.050. 



to estimate the simulation cost with the standard HMC in Ref. [92]. 

In Table we also list iV mu it in the non-trivial topological sectors. At least at the simulated 
quark mass m = 0.050, we have not observed a substantial Q dependence of iV mu it. 



5 static quark potential 

We calculate the static quark potential to fix the lattice spacing through the Sommer scale [50]. 
The temporal Wilson loops W(r, t) are measured up to t = N t /2 and r = \/3N s /2 with the spatial 
Wilson line parallel to one of the following six directions 

(1,0,0), (1,1,0), (2,1,0), (1,1,1), (2,1,1), (2,2,1). (47) 

Gauge configurations separated by 10 HMC trajectories are smeared twenty times using a 
method proposed in Ref. [93], and we measure W(r,t) every four smearing steps. The com- 
putational cost of this measurement is not large: it takes about 2 minutes per configuration on 
a single node of SRI 1000 with the sustained speed of 30%. 

We determine the static potential V(r) from the correlated fit 

W(r,t) = C(r) exp[-V(r)t] (48) 

at the number of the smearing steps which gives the maximum value of the overlap to the ground 
state C(r). The fit range [£ m in, t max ] is set to [3,5] by inspecting t-dependence of the effective 
potential 

V eS (r) = hi[W(r,t)/W(r,t + l)]. (49) 
Examples of V c s(r) are shown in Fig. [20j and V(r) is plotted as a function of r in Fig. [2TJ 
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Figure 21: Static quark potential V(r) at m = 0.015 (left panel) 0.050 (right panel). The solid 
line shows the fit of Eq. (|50|) , whereas figure legend represents the direction of the spatial Wilson 
line listed in Eq. (|37j) . 
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Figure 22: Fit range [r m ; n ,r max ] dependence of fit parameters a and a in Eq. (|50|) at m = 0.015. 
Our choice r m - m = ^/6 and r max = 8\/3 is shown by dashed lines. 

Table 10: Fit parameters in Eq. (|50|) and ro from Eq. (|52|) . The first and second errors are 
statistical and systematic, respectively. 






m 


Q 


V 


a 


a 


^0 


2.30 


0.015 





0.786(3)(15) 


0.403(5)(28) 


0.0740(6)(26) 


4.103(14)(51) 


2.30 


0.025 





0.776(4)(21) 


0.389(5) (36) 


0.0763(7)(37) 


4.064(13)(59) 


2.30 


0.035 





0.769(4)(22) 


0.381(5)(38) 


0.0780(6)(34) 


4.032(10) (49) 


2.30 


0.050 





0.760(3)(16) 


0.375(5)(30) 


0.0812(7)(26) 


3.963(11)(42) 


2.30 


0.070 





0.756(4)(25) 


0.373(6) (42) 


0.0832(7)(40) 


3.917(11)(40) 


2.30 


0.100 





0.749(4)(18) 


0.368(5)(30) 


0.0864(7)(30) 


3.852(10)(35) 


2.30 


0.050 


-2 


0.759(7)(24) 


0.370(11)(44) 


0.0803(12)(34) 


3.993(16)(29) 


2.30 


0.050 


-4 


0.758(6)(20) 


0.368(10)(36) 


0.0811(12)(29) 


3.976(19)(36) 
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Figure 23: Sommer scale ro as a function of r m ; n at m = 0.015 (top panel) and 0.050 (bottom 
panel) . We set r max = 8\/3. 

We do not observe any clear sign of the string breaking even at our smallest quark mass 
~ m s /6 possibly due to small overlap of the spatial Wilson line to the two static-light meson 
state. We therefore fit V(r) to the conventional form with the perturbative Coulomb and the 
linear confinement terms 

V(r) = Vo-a/r + ar. (50) 

The fit range is set to [r m j n , r max ] = [v6, 8v3] at all quark masses from the stability of a and 
a against the choice of the fit range shown in Fig. [2"2l Fit results are summarized in Table [TUJ 
Systematic errors due to the choice of the fit ranges are estimated from the (maximum) change 
in the fit parameters by shifting \t m \ n , t max \ to [4, 6] or varying r m \ n and r max in ranges r m j n G [2, 3] 
and r max £ [8\/2, 8v3~] • These are added in quadrature in Table [TO]. The fit curves are shown in 
Fig. [2D 

The Sommer scale ro is defined through the derivative of V(r) in the intermediate region of 
r [50] 

rldV{r)/dr\ r=ro = 1.65. (51) 
We fix ro in our simulations through the parametrization Eq. (|50l) 



/1.65-a . . 

r = (52) 

instead of the numerical derivative. In Fig. [22l we observe that r m j n dependence of a is large 
and correlated to that of a at r m j n < 2. It turns out that these uncertainties partially cancel 
each other in the ratio Eq. ([52]) leading to a mild r m i n dependence of ro shown in Fig. [23j 
Therefore, as intended in Ref. [50], ro provides a more reliable estimate of the lattice scale than 
the previously-used input ^fo even through the parametrization Eq. (j50l) over the wide region 
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Table 11: Fit parameters in Eqs. (|53h and (|54|) . The first and second errors are statistical and 
systematic. 



fit form 


X 2 /dof 


c (,) 


c (/) 




Eq. dS3D 


1.60 


0.1184(3)(17) 


0.0914(49)(57) 




Eq. (J33D 


0.47 


0.1172(6)(17) 


0.145(24)(8) 


-0.45(20)(2) 


Eq. §1D 


2.00 


8.43(2)(12) 


-5.93(32)(50) 




Eq. ipj) 


0.46 


8.52(4)(13) 


-10.0(1.6)(0.8) 


34(13)(3) 



# a vs m: linear 
■ a vs m: quad. 




Figure 24: Chiral extrapolation of a{m) with linear form of Eq. (j53[) (circles). Extrapolated 
values with other fitting forms in Eqs. ([53j) (square) and (f54"P (triangles) are also plotted. Two 
error bars for these symbols show statistical and total errors. 

of r. Our numerical results are summarized in Table [TU1 We note that r$ from three topological 
sectors are consistent with each other within their statistical accuracy. Its Q dependence is 
therefore ignored in the following analysis. 



We employ an input ro = 0.49 fm to fix the scale. A quantity a(m) = 0.49/Vo(m) is then 
extrapolated to the chiral limit testing the following fitting functions up to quadratic order 

a(m) = Co + c\ m (+ C2 m 2 ), (53) 
a(m)" 1 = c' + c[m (+c' 2 m 2 ). (54) 

Fit parameters are summarized in Table HU Since we have accurate data in the wide range of 
m, the lattice spacing in the chiral limit a = a(0) is very stable against the choice of the fitting 
function as plotted in Fig. O We obtain 



a = 0.1184(3)(17)(12) fm, (55) 

where the central value is from the linear form of Eq. ([53]) and the first error is statistical. The 
second error is due to the choice of the fit ranges for Eqs. (|48p and (|50p . The third represents the 
uncertainty due to the choice of the chiral extrapolation form and estimated by the maximum 
deviation in a from other three forms Eqs. (f53l) (quadratic) and (JE 
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Figure 25: Lattice spacing determined from r$ in simulations with Iwasaki gauge action. Circles 
are our estimate Eq. (j55f) and those in Ref. [57] and our preparatory study. These are compared 
with results in pure gauge theory [99] (solid line), only with extra fermions [54] (triangles), and 
with an improved Wilson fermions [87] (squares). 



Inclusion of dynamical quarks into simulations generally makes us decrease (3 to keep the lattice 
spacing fixed. The magnitude of the (3 shift depends on the fermion formulation. A sizable 
negative shift, or too large bare coupling in other words, could cause problematic lattice artifacts: 
for instance, one may suffer from a remnant of the fundamental-adjoint phase transition [94]. In 
practice, some evidence of non-trivial phase structure has been found in previous unquenched 
simulations even at relatively fine lattice spacing a~0.1 fm [95-97]. 

In Fig. I251 we compare the lattice spacing determined from tq in our and previous simulations 
with the Iwasaki gauge action. The (3 shift due to the extra- Wilson fermions is not expected to 
be large, since effects of their high modes are cancelled in the ratio Eq. ([6]). This is supported by 
the one-loop calculation of the vacuum polarization function in Ref. [47], and Fig. [251 provides 
a non-perturbative confirmation. 

The figure also shows that the dynamical overlap fermions lead to small (3 shift, which is in a 
good accordance with an one-loop calculation in Ref. [98]. The net shift is substantially smaller 
than that from the tadpole-improved clover fermions. Therefore the (3 shift is less problematic 
in dynamical overlap simulations even with the unphysical fermions, and this is also likely the 
case in three flavor QCD. 

6 Locality 

The locality of the overlap operator D is closely related to the properties of low-lying modes 
(^w,(ci%,/:) of -ffw- It is proved in Ref. [100] that D is exponentially local \D(x, y)\ oce~^ x ~ y ^ 1 if 
l^w,fc| h as a positive lower bound. This does exist in our simulations by the use of the auxiliary 
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Figure 26: Function /w.i( r ) at m = 0.025. The figure legend shows windows in |Aw|- The 
dashed lines show the fit Eq. (1581). 



determinant det[A\y]- 

The central concern is therefore the size of the localization range I, which should be smaller 
than the QCD scale Aqcd- In Refs - [101,102], it is argued that the range of D is characterized 
by two sets of eigenmodes of i?w : i) localized low-lying modes, whose maximum eigenvalue is 
denoted by Aw in the following, and ii) extended modes with higher eigenvalues. It leads to a 
conjecture 

\D(x,y)\ ~ A w p(Aw)exp 

where is the localization length of the localized modes, and p represents the spectral density. 
The parameter Aw,c is the so-called mobility edge, which separates the localized and extended 
modes. The prefactor of the first term follows from a steep rise in p observed in Refs. [101, 102]. 
The extended modes govern the localization properties of D through A\y,c provided that C S> 
Awp(Aw) and A w , c < (2/ Wii ) _1 . 

We estimate A\y,c in our simulations in the following steps. First, we locate the lattice site 
yk, where the fc-th lowest mode has its maximum magnitude r}k(x) =«w,k(i) ^W,k(^)- Then a 
function characterizing its decay is obtained by the average 

/w,fc(r) = j^j-z Yl ( 57 ) 

^ x,\x— yh\=T 

where iV p t(r) represents the number of lattice points which have the same distance r from y^. 
Since the spectrum of Hyj depends on the gauge configuration, we consider a range < | Aw I < 0-3, 
which is divided into windows with its size of AAw = 0.3/125, and /w,k( r ) is averaged over the 
eigenmodes in each window. We calculate /w,i( r ), where i is now a window index, at m = 
0.025 and 0.050 using 10-40 configurations separated by 10 trajectories. Because of the small 
statistics, we set the bin size to 1 configuration, which possibly underestimates the statistical 
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Figure 27: Inverse of localization length for low-lying modes of as a function of their 
eigenvalue |A\y|- 




Figure 28: Function f(r) at m = 0.025 as a function of distance r. The solid line shows the 
exponential fit f(r)oce~ r ^ 1 . 



error quoted in this section. An example of /w,i( r ) is plotted in Fig. [26l Generally speaking, 
low modes decay exponentially at large r and the decay rate decreases as |Aw| increases. 



The localization length at i-th window Z\y(|Aw|i) is determined by fitting /w,i( r ) at large r 



to 



/w,i(j) 



Cj exp 



Zw(|Aw|j 



(58) 



Its | Aw | dependence is plotted in Fig. [571 The mobility edge Aw,c is then estimated as |A\y| at 
which /w(|Aw|) _1 vanishes. It turns out that Aw,c has small m dependence but is roughly 0.33. 
550 MeV in physical units from our estimate of a in Eq. (1551) . 



l 

A W,c 



At m = 0.025, we also study the localization properties directly from the overlap operator 
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multiplied to a point like quark vector 

f(r) = max D(x,x')5(x' - y) \ . (59) 

x, \x-y\i=r y-J J 

Here we use the taxi driver distance \x — y\\ = \x^ — y M | to avoid underestimating the 
localization range I. We obtain 

r 1 = 796(2) MeV (60) 

by an exponential fit f(r) oc e~ T l l shown in Fig. [28l Both of Aw,c and I therefore suggest that 
the overlap operator is exponentially local with a localization range smaller than Aq^, d in our 
simulations. 



7 Conclusions 



In this article, we simulate two-flavor QCD with dynamical overlap quarks on the reasonably 
large (1.9 fm) and fine (a = 0.12 fm) lattice. The high statistics of 10,000 trajectories are 
accumulated at sea quark masses down to m s /6. The key step leading to such large-scale 
simulations is the suppression of the (near-)zero modes of Hyj by the auxiliary determinant. This 
enables us to use relatively cheap approximation of sgn[Hw] and also to avoid the substantial 
overhead to deal with the discontinuity of the overlap action. The use of the 5D CG algorithm, 
the Hasenbusch mass preconditioning and the multiple time scale MD integration also reduces 
the simulation cost to a large extent. 

Dynamical overlap simulations are still computationally demanding compared to the domain- 
wall fermions [16]. The complexity of the overlap formulation however suggests that there is 
much room of improvement in the implementation of HMC. The low- mode preconditioning 
for the 5D solver is developed after this study and implemented in our latest runs [15, 16]. 
Further improvement in the solver algorithm, especially in the 5D solver (or alternatives) to 
invert (D^D), is a central concern for pushing simulations to larger volumes. The test of MD 
integration schemes with less discretization error and /or a further tuning of the HMC parameters 
and the unit trajectory length are also interesting subjects to be studied. 

We are now studying various non-perturbative aspects of two-flavor QCD using the generated 
gauge ensembles. The chiral condensate is one of the most fundamental parameters in ChPT and 
has been determined in Refs. [57,58]. Studies of the low-lying hadron spectrum [103], the kaon 
B parameter [104] and the pion form factor [105] are in progress with paying particular attention 
to the consistency of their chiral behavior with ChPT. Finite size corrections based on ChPT 
are also important issue in these studies. Our calculation of the topological susceptibility [65] is 
an important step to study the nature of the QCD vacuum in fixed topological sectors. The pion 
mass splitting through the vector and axial-vector current correlators [106] is an example for 
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which the exact chiral symmetry is crucial and might be difficult to study even with the domain- 
wall fermions. Finally, our simulations have been already extended to three-flavor QCD [15, 16] 
for fully realistic studies of QCD. 

Acknowledgment 

Numerical simulations are performed on Hitachi SRI 1000 and IBM System Blue Gene Solution 
at High Energy Accelerator Research Organization (KEK) under a support of its Large Scale 
Simulation Program (No. 06-13 and 07-16). We thank J. Doi, H. Samukawa and S. Shimizu 
of IBM Japan Tokyo Research Laboratory for assembler coding on the Blue Gene computer. 
This work is supported in part by the Grant-in- Aid of the Ministry of Education (No. 17340066, 
17540259, 17740171, 18034011, 18340075, 18740167, 18840045, 19540286 and 19740160). The 
work of HF is also supported by Nishina Memorial Foundation. 

References 

[1] T. Takaishi and Ph. de Forcrand, Nucl. Phys. B (Proc.Suppl.) 94, 818 (2001) [hep- 
lat/0011003]; Int. J. Mod. Phys. C 13, 343 (2002) [arXiv:hep-lat/0108012]. 

[2] S. Aoki et al. (JLQCD collaboration), Phys. Rev. D 65, 94507 (2002) [arXiv:hep- 
lat/0112051]. 

[3] J. C. Sexton and D. H. Weingarten, Nucl. Phys. B 380, 665 (1992). 

[4] M. Hasenbusch, Phys. Lett. B 519 (2001) 177 [arXiv:hep-lat/0107019]. 

[5] M. Hasenbusch and K. Jansen, Nucl. Phys. B 659, 299 (2003) [arXiv:hep-lat/02 11042]. 

[6] M. Liischer, JHEP 0305, 052 (2003) [arXiv:hep-lat/0304007]. 

[7] M. Liischer, Comput. Phys. Commun. 165, 199 (2005) [arXiv:hep-lat/0409106]. 

[8] M. A. Clark and A. D. Kennedy, Phys. Rev. Lett. 98, 051601 (2007) [arXiv:hep- 
lat/0608015]. 

[9] C. Bernard et al. (MILC collaboration), Phys. Rev. D 64, 054506 (2001) [arXiv:hep- 
lat/0104002]; C. Aubin et al. (MILC collaboration), ibid. D 70, 094505 (2004) 
[arXiv:hep-lat/0402030]; for a latest report, see C. Bernard et al. (MILC collaboration), 
arXiv:0710.1118 [hep-lat]. 

[10] D. Kadoh et al. (PACS-CS collaboration), PoS (LATTICE 2007), 109 [arXiv:0710.3467 [hep- 
lat]]; N. Ukita et al. (PACS-CS collaboration), PoS (LATTICE 2007), 138 [arXiv:0710.3462 
[hep-lat]]; Y. Kuramashi, PoS (LATTICE 2007), 017 [arXiv:071 1.3938 [hep-lat]]. 

[11] L. Del Debbio, L. Giusti, M. Liischer, R. Petronzio and N. Tantalo, JHEP 0702, 056 (2007) 
[arXiv:hep-lat/0610059]; ibid. 0702, 082 (2007) [arXiv:hep-lat/0701009]. 



33 



[12] M. Gockeler et al. (QCDSF and UKQCD collaborations) PoS (LATTICE 2006), 179 
[arXiv:hep-lat /0610066] . 

[13] Ph. Boucaud et al. (ETM collaboration), Phys. Lett. B 650, 304 (2007) [arXiv:hep- 
lat/0701012]; C. Urbach, arXiv:0710.1517 [hep-lat]. 

[14] C. Allton et al. (RBC and UKQCD collaborations), Phys. Rev. D 76, 014504 (2007) 
[arXiv:hep-lat/0701013]; P. Boyle, PoS (LATTICE 2007), 005 [arXiv:0710.5880 [hep-lat]]. 

[15] S. Hashimoto et al. (JLQCD collaboration), PoS (LATTICE 2007), 101 [arXiv:0710.2730 
[hep-lat]]. 

[16] H. Matsufuru, PoS (LATTICE 2007) 018 [arXiv:0710.4225 [hep-lat]]. 
[17] S. Diirr et al, arXiv:0710.4769 [hep-lat]. 

[18] T. Ishikawa et al. (CP-PACS and JLQCD collaborations), arXiv:0704.1937 [hep-lat]. 

[19] S. R. Sharpe and R. Singleton Phys. Rev. D 58, 074501 (1998) [arXiv:hep-lat/9804028]. 

[20] G. Rupak and N. Shoresh, Phys. Rev. D 66, 054503 (2002) [arXiv:hep-lat/0201019]. 

[21] S. Aoki, Phys. Rev. D 68, 054508 (2003) [arXiv:hep-lat/0306027]. 

[22] W. Lee and S. Sharpe, Phys. Rev. D 60, 114503 (1999) [arXiv:hep-lat/9905023]. 

[23] C. Bernard, Phys. Rev. D 65, 054031 (2002) [arXiv:hep-lat/0111051]. 

[24] C. Aubin and C. Bernard, Phys. Rev. D 68, 034014 (2003) [arXiv:hep-lat/0304014]; ibid. 
D 68, 074011 (2003) [arXiv:hep-lat/0306026]. 

[25] M. Delia Morte, R. Hoffmann, F. Knechtli and U. Wolff (ALPHA collaboration), Comput. 
Phys. Commun. 165, 49 (2005) [arXiv:hep-lat/0405017]. 

[26] L. Del Debbio, L. Giusti, M. Liischer, R. Petronzio and N. Tantalo, JHEP 0602,011 (2006) 
[arXiv:hep-lat/0512021]. 

[27] D. B. Kaplan, Phys. Lett. B 288, 342 (1992) [arXiv:hcp-lat/9206013]. 

[28] Y. Shamir, Nucl. Phys. B 406, 90 (1993) [arXiv:hep-lat/9303005]. 

[29] V. Furman and Y. Shamir, Nucl. Phys. B 439, 54 (1995) [arXiv:hep-lat/9405004]. 

[30] Y. Aoki et al. (RBC collaboration), Phys.Rev. D 72, 114505 (2005) [arXiv:hep-lat/0411006]. 

[31] R. Narayanan and H. Neuberger, Nucl. Phys. B443, 305 (1995) [arXiv:hep-th/9411108]. 

[32] H. Neuberger, Phys. Lett. B 417, 141 (1998) [arXiv:hep-lat/9707022]; ibid. B 427, 353 
(1998) [arXiv:hep-lat/9801031]. 

[33] P. H. Ginsparg and K. G. Wilson, Phys. Rev. D 25, 2649 (1982). 



34 



[34] P. Hasenfratz, V. Laliena and F. Nidermayer, Phys. Lett. B 427, 125 (1998) [arXiv:hep- 
lat/9801021]. 

[35] P. Hasenfratz, Nucl. Phys. B 525, 401 (1998) [arXiv:hep-lat/9802007]. 

[36] M. Liischer, Phys. Lett. B 428, 342 (1998) [arXiv:hep-lat/9802011]. 

[37] P. Hasenfratz, Nucl. Phys. B (Proc.Suppl.) 63 , 53 (1998). [arXiv:hep-lat/9709110]. 

[38] C. Gattringer, Phys. Rev. D 63, 114501 (2001) [arXiv:hep-lat/0003005]. 

[39] W. Bietenholz and I. Hip, Nucl. Phys. B 570, 423 (2000) [arXiv:hep-lat/9902019]. 

[40] A. Hasenfratz, P. Hasenfratz, D. Hierl, F. Niedermayer and A. Schafer, PoS (LATTICE 
2006), 178 [arXiv:hep-lat/0610096]. 

[41] C. B. Lang, P. Majumdar and W. Ortner, Phys. Rev. D 73, 034507 (2006) [arXiv:hep- 
lat/0512014]; R. Frigori et al, Pos (LATTICE 2007), 114 [arXiv:0709.4582 [hep-lat]]. 

[42] A. Bode, U. M. Heller, R. G. Edwards and R. Narayanan, arXiv:hep-lat/9912043. 

[43] Z. Fodor, S. D. Katz and K. K. Szabo, JHEP 0408, 003 (2004) [arXiv:hep-lat/0311010]. 

[44] N. Cundy, S. Krieg, A. Frommer, Th. Lippert and K. Schilling, Nucl. Phys. B (Proc.Suppl.) 
140, 841 (2005) [arXiv:hep-lat/0409029]. 

[45] T. DeGrand and S. Schaefer, Phys. Rev. D 71, 034507 (2005) [arXiv:hep-lat/0412005]. 

[46] N. Cundy et al, arXiv:hep-lat/0502007. 

[47] H. Fukaya et al. (JLQCD collaboration), Phys. Rev. D 74, 094505 (2006) [hep-lat/0607020]. 

[48] R. Brower, S. Chandrasekharan, J. W. Negele and U.-J. Wiese, Phys. Lett. B 560, 64 
(2003) [arXiv:hep-lat/0302005]. 

[49] S. Aoki, H. Fukaya, S. Hashimoto and T. Onogi, Phys. Rev. D 76, 054508 (2007) 
[arXiv:0707.0396 [hep-lat]]. 

[50] R. Sommer, Nucl. Phys. B411 (1994) 839 [arXiv:hep-lat/9310022]. 

[51] H. Matsufuru et al. (JLQCD collaboration), PoS (LATTICE 2006), 031 [arXiv:hep- 
lat/0610026]. 

[52] S. Hashimoto el al. (JLQCD collaboration), PoS (LATTICE 2006), 052 [arXiv:hep- 
lat/0610011]. 

[53] T. Kaneko et al. (JLQCD collaboration), PoS (LATTICE 2006), 054 [arXiv:hep- 
lat/0610036]. 

[54] N. Yamada et al. (JLQCD collaboration), PoS (LATTICE 2006), 060 [arXiv:hep- 
lat/0609073]. 



35 



[55] H. Fukaya et al. (JLQCD collaboration), PoS (LATTICE 2006), 050 [arXiv:hep- 
lat/0610024]. 

[56] H. Fukaya et al. (JLQCD collaboration), PoS (LATTICE 2007), 073 [arXiv:0710.3468 [hep- 
lat]]. 

[57] H. Fukaya et al. (JLQCD collaboration), Phys. Rev. Lett. 98, 172001 (2007) [arXiv:hep- 
lat/0702003]. 

[58] H. Fukaya et al. (JLQCD and TWQCD collaborations), Phys. Rev. D 76, 054503 (2007). 
[arXiv:0705.3322 [hep-lat]] 

[59] H. Fukaya et al. (JLQCD collaboration), arXiv:071 1.4965 [hep-lat]. 

[60] R. Babich et al, JHEP 0601, 086 (2006) [arXiv:hep-lat/0509027]. 

[61] A. Ali Khan et al. (CP-PACS collaboration), Phys. Rev. D 63, 114504 (2001) [arXiv:hep- 
lat/0007014]. 

[62] Y. Iwasaki, preprint UTHEP-118 (1983), unpublished. 

[63] P. M. Vranas, arXiv:hep-lat/0001006; Phys. Rev. D 74, 034512 (2006) [arXiv:hep- 
lat/0606014]. 

[64] T. Izubuchi and C. Dawson, Nucl. Phys. B (Proc.Suppl.) 106, 748 (2002). 

[65] S. Aoki et al. (JLQCD and TWQCD collaborations), arXiv:0710.1130 [hep-lat]. 

[66] M. Golterman and Y. Shamir, Phys. Rev. D 76, 094512 (2007) [arXiv: 0705. 2928 [hep-lat]]. 

[67] N. Cundy, S. Kreig, T. Lippert and A. Schafer, arXiv:0803.0294 [hep-lat]. 

[68] N. I. Akhiezer, Theory of approximation, F. Ungar, New York, 1956. 

[69] D. Ingerman, V. Druskin and L. Knizhnerman, Comm. Pure Appl. Math. 53, 1039 (2000). 

[70] A. Frommer, S. Giisken, T. Lippert, B. Nockel and K. Schilling, Int. J. Mod. Phys. C 6, 
627 (1995) [arXiv:hep-lat/9504020]. 

[71] B. Jegerlehner, arXiv:hep-lat/9612014. 

[72] L. Giusti, C. Hoelbling, M. Liischer, H. Wittig, Comput. Phys. Commun. 153, 31 (2003) 
[arXiv:hep-lat/0212012]. 

[73] N. Cundy et al, Comput. Phys. Commun. 165 (2005) 221 [arXiv:hep-lat/0405003]. 

[74] R. Narayanan and H. Neuberger, Phys. Rev. D 62, 074504 (2000) [arXiv:hep-lat/0005004]. 

[75] A. Borici, arXiv:hep-lat/9912040; hep-lat/0402035. 

[76] R. G. Edwards, B. Joo, A. D. Kennedy, K. Orginos and U. Wenger, PoS (LAT2005) 146 
[arXiv:hep-lat/0510086]. 



36 



[77] A. Ali Khan et al. (QCDSF collaboration), Phys. Lett. B 564 (2003) 235 [arXiv:hep- 
lat/0303026]. 

[78] C. Urbach, K. Jansen, A. Shindler and U. Wenger, Comput. Phys. Commun. 174, 87 (2006) 
[arXiv:hep-lat/0506011]. 

[79] S. Schaefer and T. DeGrand, PoS (LAT2005) 140 [arXiv:hep-lat/0508025]. 

[80] G. I. Egri, Z. Fodor, S. D. Katz, K. K. Szabo, JHEP 0601, 049 (2006) [arXiv:hep- 
lat/0510117]. 

[81] A. D. Kennedy and J. Kuti, Phys. Rev. Lett. 54, 2473 (1985). 
[82] J. Noaki et al. (JLQCD collaboration), in preparation. 

[83] H. Fukaya, S. Hashimoto, T. Hirohashi, K. Ogawa and T. Onogi, Phys. Rev. D 73, 014503 
(2006) [arXiv:hep-lat/0510116]. 

[84] N. Madras and A. D. Sokal, J. Stat. Phys. 50 (1988) 109. 

[85] U. Wolff, Comput. Phys. Commun. 156 (2004) 143 [arXiv:hep-lat/0306017]. 

[86] U. Glassner et al. (SESAM collaboration), Nucl. Phys. B (Proc.Suppl.) 47, 386 (1996) 
[arXiv:hep-lat/9510001]; N. Eicker et al. (SESAM collaborate) , Phys. Lett. B 407, 290 
(1997) [arXiv:hep-lat/9704019]. 

[87] A. Ali Khan et al. (CP-PACS collaboration), Phys. Rev. D 65, 054505 (2002) [Erratum: ibid. 
D 67 059901 (2003), arXiv:hep-lat/0105015]. 

[88] S. Aoki et al. (JLQCD collaboration), Phys. Rev. D 68, 054502 (2003) [arXiv:hep- 
lat/02 12039]. 

[89] Y. Namekawa et al. (CP-PACS collaboration), Phys. Rev. D 70, 074503 (2004) [arXiv:hep- 
lat/0404014]. 

[90] B. Joo et al. (UKQCD collaboration), Phys. Rev. D 62, 114501 (2000) [arXiv:hep- 
lat/0005023]. 

[91] A. D. Kennedy, Nucl. Phys. B (Proc.Suppl.) 140, 190 (2005) [arXiv:hep-lat/0409167]. 
[92] A. Ukawa, Nucl. Phys. B (Proc.Suppl.) 106, 195 (2002). 
[93] G. S. Bali and K. Schilling, Phys. Rev. D 46, 2636 1992. 
[94] G. Bhanot, Phys. Lett. B 108, 337 (1982). 

[95] T. Blum et al, Phys. Rev. D 50, 3377 (1994) [arXiv:hep-lat/9404006]. 

[96] F. Farchioni et al, Eur. Phys. J. C 39, 421 (2005) [arXiv:hep-lat/0406039]. 

[97] S. Aoki et al. (JLQCD collaboration), Phys. Rev. D 72, 054510 (2005) [arXiv:hep- 
lat/0409016]. 

37 



[98] C. Alexandrou, H. Panagopoulos and E. Vicari, Nucl. Phys. B 571, 257 (2000) [arXiv:hep- 
lat/9909158]. 

[99] S. Takeda et al. (CP-PACS collaboration), Phys. Rev. D 70, 074510 (2004) [arXiv:hep- 
lat/0408010]. 

[100] P. Hernandez, K. Jansen and M. Liischer, Nucl. Phys. B 552, 363 (1999) [arXiv:hep- 
lat/9808010]. 

[101] M. Golterman and Y. Shamir, Phys. Rev. D 68, 074501 (2003) [arXiv:hep-lat/0306002]. 

[102] M. Golterman, Y. Shamir and B. Svetitsky, Phys. Rev. D 72, 034501 (2005) [arXiv:hep- 
lat/0503037]. 

[103] J. Noaki et al. (JLQCD collaboration), PoS (LATTICE 2007) 126 [arXiv:0710.0929 [hep- 
lat]]. 

[104] N. Yamada et al. (JLQCD collaboration), arXiv:0710.0462 [hep-lat]. 

[105] T. Kaneko et al. (JLQCD collaboration), PoS (LATTICE 2007) 148 [arXiv:0710.2390 
[hep-lat]]. 

[106] E. Shintani et al. (JLQCD collaboration), PoS (LATTICE 2007) 134 [arXiv:0710.0691 
[hep-lat]]. 



38 



